Search Result

Select

Scholar fine-grained information extraction method fused with local semantic features

Yuelin TIAN, Ruizhang HUANG, Lina REN

Journal of Computer Applications 2023, 43 (9): 2707-2714. DOI: 10.11772/j.issn.1001-9081.2022091407

Abstract （177）

HTML （12）

PDF （1296KB）（97）

Save

It is importantly used in the fields such as creation of large-scale professional talent pools to extract scholar fine-grained information such as scholar’s research directions， education experience from scholar homepages. To address the problem that the existing scholar fine-grained information extraction methods cannot use contextual semantic associations effectively， a scholar fine-grained information extraction method incorporating local semantic features was proposed to extract fine-grained information from scholar homepages by using semantic associations in the local text. Firstly， general semantic representation was learned by the full-word mask Chinese pre-trained model RoBERTa-wwm-ext. Subsequently， the representation vector of the target sentence， as well as its locally adjacent text representation vector from the general semantic embeddings， were jointly fed into a CNN （Convolutional Neural Network） to accomplish local semantic fusion， thereby obtaining a higher-dimensional representation vector for the target sentence. Finally， the representation vector of the target sentence was mapped from the high-dimensional space to the low-dimensional labeling space to extract the fine-grained information from the scholar homepage. Experimental results show that the micro-average F1 score of the scholar fine-grained information extraction method fusing local semantic features reaches 93.43%， which is higher than that of RoBERTa-wwm-ext-TextCNN method without fusing local semantic by 8.60 percentage points， which verifies the effectiveness of the proposed method on the scholar fine-grained information extraction task.

Table and Figures | Reference | Related Articles | Metrics

Select

Hierarchical storyline generation method for hot news events

Dong LIU, Chuan LIN, Lina REN, Ruizhang HUANG

Journal of Computer Applications 2023, 43 (8): 2376-2381. DOI: 10.11772/j.issn.1001-9081.2022091377

Abstract （431）

HTML （20）

PDF （1333KB）（278）

Save

The development of hot news events is very rich， and each stage of the development has its own unique narrative. With the development of events， a trend of hierarchical storyline evolution is presented. Aiming at the problem of poor interpretability and insufficient hierarchy of storyline in the existing storyline generation methods， a Hierarchical Storyline Generation Method （HSGM） for hot news events was proposed. First， an improved hotword algorithm was used to select the main seed events to construct the trunk. Second， the hotwords of branch events were selected to enhance the branch interpretability. Third， in the branch， a storyline coherence selection strategy fusing hotword relevance and dynamic time penalty was used to enhance the connection of parent-child events， so as to build hierarchical hotwords， and then a multi-level storyline was built. In addition， considering the incubation period of hot news events， a hatchery was added during the storyline construction process to solve the problem of neglecting the initial events due to insufficient hotness. Experimental results on two real self-constructed datasets show that in the event tracking process， compared with the methods based on singlePass and k-means respectively， HSGM has the F score increased by 4.51% and 6.41%， 20.71% and 13.01% respectively； in the storyline construction process， HSGM performs well in accuracy， comprehensibility and integrity on two self-constructed datasets compared with Story Forest and Story Graph.

Table and Figures | Reference | Related Articles | Metrics

Select

DDDC： deep dynamic document clustering model

Hui LU, Ruizhang HUANG, Jingjing XUE, Lina REN, Chuan LIN

Journal of Computer Applications 2023, 43 (8): 2370-2375. DOI: 10.11772/j.issn.1001-9081.2022091354

Abstract （264）

HTML （11）

PDF （1962KB）（119）

Save

The rapid development of Internet leads to the explosive growth of news data. How to capture the topic evolution process of current popular events from massive news data has become a hot research topic in the field of document analysis. However， the commonly used traditional dynamic clustering models are inflexible and inefficient when dealing with large-scale datasets， while the existing deep document clustering models lack a general method to capture the topic evolution process of time series data. To address these problems， a Deep Dynamic Document Clustering （DDDC） model was designed. In this model， based on the existing deep variational inference algorithms， the topic distributions incorporating the content of previous time slices on different time slices were captured， and the evolution process of event topics was captured from these distributions through clustering. Experimental results on real news datasets show that compared with Dynamic Topic Model （DTM）， Variational Deep Embedding （VaDE） and other algorithms， DDDC model has the clustering accuracy and Normalized Mutual Information （NMI） improved by at least 4 percentage points averagely and at least 3 percentage points respectively in each time slice on different datasets， verifying the effectiveness of DDDC model.

Table and Figures | Reference | Related Articles | Metrics

Select

Structured deep text clustering model based on multi-layer semantic fusion

Shengwei MA, Ruizhang HUANG, Lina REN, Chuan LIN

Journal of Computer Applications 2023, 43 (8): 2364-2369. DOI: 10.11772/j.issn.1001-9081.2022091356

Abstract （290）

HTML （15）

PDF （1642KB）（188）

Save

In recent years， due to the advantages of the structural information of Graph Neural Network （GNN） in machine learning， people have begun to combine GNN into deep text clustering. The current deep text clustering algorithm combined with GNN ignores the important role of the decoder on semantic complementation in the fusion of text semantic information， resulting in the lack of semantic information in the data generation part. In response to the above problem， a Structured Deep text Clustering Model based on multi-layer Semantic fusion （SDCMS） was proposed. In this model， a GNN was utilized to integrate structural information into the decoder， the representation of text data was enhanced through layer-by-layer semantic complement， and better network parameters were obtained through triple self-supervision mechanism.Results of experiments carried out on 5 real datasets Citeseer， Acm， Reutuers， Dblp and Abstract show that compared with the current optimal Attention-driven Graph Clustering Network （AGCN） model， SDCMS in accuracy， Normalized Mutual Information （NMI ） and Average Rand Index （ARI） has increased by at most 5.853%， 9.922% and 8.142%.

Table and Figures | Reference | Related Articles | Metrics